From: Joey Hess Date: Mon, 15 Sep 2025 15:59:37 +0000 (-0400) Subject: fix p2phttp worker thread leak with deleted repository LOCKCONTENT X-Git-Tag: archive/raspbian/10.20251029-1+rpi1~1^2~3^2~118 X-Git-Url: https://dgit.raspbian.org/%22http://www.example.com/cgi/%22/%22http:/www.example.com/cgi/%22?a=commitdiff_plain;h=91dbcf0b56ba540a33ea5a79ed52f33e82f4f61b;p=git-annex.git fix p2phttp worker thread leak with deleted repository LOCKCONTENT p2phttp: Fix a hang that could occur when used with --directory, and a repository in the repository got removed. It could leak up to -J number of worker threads, but this only affected a client trying to access the deleted repository. It may be that this could also affect a non-deleted repository, and also leak a worker thread, if invalid p2p protocol is sent. --- diff --git a/CHANGELOG b/CHANGELOG index cb3a0e0c15..371f53f30f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Improve performance when used with a local git remote that has a large working tree. * Removed support for building with cryptonite, use crypton. + * p2phttp: Fix a hang that could occur when used with --directory, + and a repository in the repository got removed. -- Joey Hess Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index 6e3d530303..88e6fa3367 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -477,14 +477,19 @@ serveLockContent mst su apiver (B64Key k) cu bypass sec auth = do let lock = do lockresv <- newEmptyTMVarIO unlockv <- newEmptyTMVarIO + -- A single worker thread takes the lock, and keeps running +- -- until unlock in order to keep the lock held. annexworker <- async $ inAnnexWorker st $ do lockres <- runFullProto (clientRunState conn) (clientP2PConnection conn) $ do net $ sendMessage (LOCKCONTENT k) checkSuccess liftIO $ atomically $ putTMVar lockresv lockres - liftIO $ atomically $ takeTMVar unlockv - void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do - net $ sendMessage UNLOCKCONTENT + case lockres of + Right True -> do + liftIO $ atomically $ takeTMVar unlockv + void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do + net $ sendMessage UNLOCKCONTENT + _ -> return () atomically (takeTMVar lockresv) >>= \case Right True -> return (Just (annexworker, unlockv)) _ -> return Nothing diff --git a/doc/todo/p2phttp_serve_multiple_repositories.mdwn b/doc/todo/p2phttp_serve_multiple_repositories.mdwn index f2ad9c752e..47cf8ad8fd 100644 --- a/doc/todo/p2phttp_serve_multiple_repositories.mdwn +++ b/doc/todo/p2phttp_serve_multiple_repositories.mdwn @@ -19,3 +19,5 @@ I asked matrss if this would be useful for forgejo-aneksajo and he said very useful, although I think I can work with the limitation [of only 1]." [[!tag projects/INM7]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment new file mode 100644 index 0000000000..c3dee29010 --- /dev/null +++ b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-09-15T15:18:15Z" + content=""" +Seems the bug is specific to LOCKCONTENT. When doing other operations, +like CHECKPRESENT after the repo is deleted, the server returns +FAILURE and continues being able to serve more requests for that repo. + +Ah, the problem is that serveLockContent is running a block of actions in +a single inAnnexWorker call, which first sends on the LOCKCONTENT, then +blocks waiting for the unlock to arrive. Which never happens, so it remains +blocked there forever, consuming a worker thread. + +Fixed that, finally. +"""]]